Team Name: YES
This section of the code tidies the data before analysis:
##knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1 v purrr 0.2.4
## v tibble 1.4.2 v dplyr 0.7.4
## v tidyr 0.8.0 v stringr 1.3.0
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ---------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
passengers <- read_xls("WebAirport_FY_1986-2017.xls", sheet=3, skip =6)
passengers <- passengers[-(1:32) ,] ## Deletes first 32 rows of "total airports""
passengers <- passengers[-c(1,4,7,10:14)] ## Deletes the columns containing "Total"
new_pass <- gather(passengers, key = temp, value = no_of_passengers, -AIRPORT, -Year, na.rm = TRUE)
new_pass[new_pass=='INBOUND__1'] <- 'IN-INTERNATIONAL'
new_pass[new_pass=='OUTBOUND__1'] <- 'OUT-INTERNATIONAL'
new_pass[new_pass=='INBOUND'] <- 'IN-DOMESTIC'
new_pass[new_pass=='OUTBOUND'] <- 'OUT-DOMESTIC'
final_pass <- new_pass %>%
separate(temp, c("bound","type_of_flight"))
This section of the code uses the tidy data to create a bar plot of the flight data into types of flight faceted by year
## Bar plot of type of flights facetted by year
ggplot(final_pass, aes(x =type_of_flight, fill=type_of_flight)) +
geom_bar(aes(weight=no_of_passengers),position= "dodge") +
theme(text = element_text(size = 40))+
facet_wrap(~ Year,ncol=3)
This section of the code creates a scatter plot of the domestic inbound and outbound flights across all airports in Australia.
## Scatter plot of domestic inbound vs. domestic outbound flights facetted by year
domestic <- final_pass %>% filter(type_of_flight %in% c("DOMESTIC"))
dom_in <- new_pass[new_pass$temp == "IN-DOMESTIC", c(4)]
dom_out <- new_pass[new_pass$temp == "OUT-DOMESTIC", c(4)]
dom_temp <- domestic[1:(nrow(domestic)/2),1:2]
dom_temp["IN"] <- dom_in
dom_temp["OUT"] <- dom_out
ggplot(dom_temp, aes(x =IN , y = OUT , title(main = "Domestic inbound vs. outbound"))) +
geom_point(colour = "green", size=8) +
xlab("Number of Passengers InBound") +
ylab("Number of Passengers OutBound") +
theme(aspect.ratio = 1, text = element_text(size=50)) +
facet_wrap(~ Year, ncol=3)
The top 3 airports are Sydney, Melbourne, and Brisbane for the most number of domestic passengers.
## To extract the top 3 airports with the most passengers in the domestic inbounds and outbounds
top_dom_inbound = top_n(final_pass %>% filter(Year %in% c("2015-16")) %>% filter(bound %in% c("IN")) %>% filter(type_of_flight %in% c("DOMESTIC")),3)
## Selecting by no_of_passengers
top_dom_outbound = top_n(final_pass %>% filter(Year %in% c("2015-16")) %>% filter(bound %in% c("OUT")) %>% filter(type_of_flight %in% c("DOMESTIC")),3)
## Selecting by no_of_passengers
top_dom_inbound[3:1,]
## # A tibble: 3 x 5
## AIRPORT Year bound type_of_flight no_of_passengers
## <chr> <chr> <chr> <chr> <dbl>
## 1 SYDNEY 2015-16 IN DOMESTIC 13299167.
## 2 MELBOURNE 2015-16 IN DOMESTIC 12242045.
## 3 BRISBANE 2015-16 IN DOMESTIC 8506091.
top_dom_outbound[3:1,]
## # A tibble: 3 x 5
## AIRPORT Year bound type_of_flight no_of_passengers
## <chr> <chr> <chr> <chr> <dbl>
## 1 SYDNEY 2015-16 OUT DOMESTIC 13249433.
## 2 MELBOURNE 2015-16 OUT DOMESTIC 12183981.
## 3 BRISBANE 2015-16 OUT DOMESTIC 8491959.
This section of the code creates a scatter plot of the international inbound and outbound flights across all airports in Australia. THe top 3 airports are Sydney, Melbourne, and Brisbane for the most number of international passengers.
## Scatter plot of international inbound vs. international outbound flights facetted by year
international <- final_pass %>% filter(type_of_flight %in% c("INTERNATIONAL"))
inter_in = new_pass[new_pass$temp == "IN-INTERNATIONAL", c(4)]
inter_out = new_pass[new_pass$temp == "OUT-INTERNATIONAL", c(4)]
inter_temp = international[1:(nrow(international)/2),1:2]
inter_temp["IN"] <- inter_in
inter_temp["OUT"] <- inter_out
ggplot(inter_temp, aes(x=IN, y = OUT, title(main = "International inbound vs. outbound flights"))) +
geom_point(colour = "green", size=8) +
xlab("Number of Passengers InBound") +
ylab("Number of Passengers OutBound") +
theme(aspect.ratio = 1, text = element_text(size=50)) +
facet_wrap(~ Year, ncol=3)
The 3 graphs that are generated in R depicts the amount of passengers that travels inbound and outbound both internationally and domestically. The scatterplots depicts a linearly increasing graph as the number of passengers that travels inbound (both domestically and internationally) will be approximately the same number as the number of passengers travelling outbound for most airports. Some airports does not facilitate international travel, hence those airports generally have a smaller amount of domestic travellers compared to airports with international travel. Furthermore, numerical analysis from the graph highlights Sydney’s proportion of international passengers (to its domestic passengers) is dominant compared to other airports. This can be explained by the fact that QANTAS (the Australian airline representative) has its main office in Sydney, hence most of the operations are managed through Sydney airport. Lastly, the international scatter plot highlights the increasingly dominant airport (Sydney) which extends further away from the second busiest airport (Melbourne).